Goals for representing knowledge (outline)
Make things (articles, genes,
antibodies, etc.) easier to find
Answer questions
Generate hypotheses
Controlled vocabularies (MeSH)
Ontologies (Gene Ontology)
knowledge graphs on the Web:
the SPARQL query language
knowledge plus computation =
inference, the ABC model
Ontology
The word comes from philosophy:
the branch of metaphysics dealing with the nature of being
In practice they are:
A set of concepts, definitions and inter-relationships.
(The dividing line between “controlled vocabulary, “thesaurus”, “ontology” is
hazy and not terribly important for practical purposes.)
We have hundreds of ontologies in biology, e.g. see:
http://www.obofoundry.org (100+)
http://bioportal.bioontology.org (500+)
The Gene Ontology
Ashburner et al., Nat Genet. 2000 May;25(1):25-9.
Started in 1999
As a collaboration between 3 Model Organism Databases
Slide credit: Mélanie Courtot, Ph.D.
A way to capture biological
knowledge for individual gene
products in a computable form
A set of concepts and their
relationships to each other
arranged as a hierarchy
http://www.ebi.ac.uk/QuickGO
Less specific concepts
More specific concepts
The Gene Ontology
Slide credit: Mélanie Courtot, Ph.D.
1. Molecular Function
An elemental activity or task or job
protein kinase activity
insulin receptor activity
3. Cellular Component
Where a gene product is located
mitochondrion
mitochondrial matrix
mitochondrial inner membrane
2. Biological Process
A commonly recognized series of events
cell division
The GO branches
Slide credit: Mélanie Courtot, Ph.D.
Knowledge Graphs
Also called “knowledge bases”
to distinguish them from
databases.
An integrated collection of
assertions or claims represented
in something that can be
visualized as a graph and is
technically very much like a
database.
RNASeq reads
Gene X is expressed
Drug A caused Gene
X to be expressed
Knowing what to do with
Drug A..
Example knowledge graphs
Wikidata: The structured equivalent of Wikipedia
http://wikidata.org
UniProt Knowledge Base: Manually curated Protein
knowledge base
http://www.uniprot.org/uniprot/
Microsoft Knowledge Graph (“Satori”)
Google Knowledge Graph
Example: “Google Knowledge Graph” (GKG)
Vemurafenib
405,000 results
1 infobox
1 node in GKG
https://googleblog.blogspot.com/2012/05/introducing-knowledge-graph-things-not.html
Why Knowledge Graphs?
?
Answer explicit questions
Uncover implicit relations
Implicit relations for hypothesis generation
ABC model
Swanson (1986) Fish oil, Raynaud’s syndrome and undiscovered public knowledge
http://muse.jhu.edu/article/403510/pdf
B
A
C
Raynaud’s
Syndrome
Dietary fish oil
platelet inhibition
vasodilation
lower blood viscosity
Co-occurs in an
article with
Co-occurs in an
article with
?
Open Discovery and Closed Discovery
Open, you don’t know what C or B is (e.g. disease -> ?drug)
Closed, you know what C is and are looking for B (e.g. disease why? drug)
B
A
C
?
Example question: drug repurposing
For a given drug, what
diseases might it be
used to treat?
http://www.ncbi.nlm.nih.gov/pubmed/27189611
'RE:fine drugs': an interactive dashboard to access drug repurposing opportunities.
Implicit relations for hypothesis generation
ABC model for drug repurposing
B
A
C
drug
disease
genes
Physical
interaction
genetic
association
?
Querying a knowledge graph with SPARQL
SPARQL protocol and RDF query
language”
RDF: Resource Description
Framework (common standard for
storing knowledge graphs)
A SPARQL query = a partially
completed graph
?s show what you are looking for
rest constrains the search
http://www.w3.org/TR/rdf-sparql-query/
?disease
Asking for
Constraints
Metformin
treats
Result:
Metformin
Type 2
diabetes
treats
=
https://query.wikidata.org/
Metformins unique id: Q19484
Treats property id: P2175
Metformin ?disease
treats
Metformin’s unique id: Q19484
Treats property id: P2175
Metformin ?disease
treats
http://tinyurl.com/gwd6pep
Example question: drug repurposing
For a given drug, what
diseases might it be
used to treat?”
?drug
?disease
interacts
with
protein
gene
encoded by
genetic
association
treats??
Example question: repurposing Metformin
http://tinyurl.com/zem3oxz
Metformin
?disease
interacts
with
protein
SLC22A3
encoded by
genetic
association
treats??
Solute carrier
family 22
member 3
SLC22A3
prostate
cancer